ESTIMATION OF CONDITIONAL CUMULATIVE DISTRIBUTION 
FUNCTION FROM CURRENT STATUS DATA 



SANDRA PLANCADE 

Abstract. Consider a positive random variable of interest Y depending on a covariate 
X, and a random observation time T independent of Y given X. Assume that the only 
knowledge available about Y is its current status at time T: S = U^ykt}- This paper 
presents a procedure to estimate the conditional cumulative distribution function F of 
Y given X from an independent identically distributed sample of {X,T, 5). 

A collection of finite-dimensional linear subsets of L^(R^) called models are built as 
tensor products of classical approximation spaces of I/^(R). Then a collection of esti- 
mators of F is constructed by minimization of a regression-type contrast on each model 
and a data driven procedure allows to choose an estimator among the collection. We 
show that the selected estimator converges as fast as the best estimator in the collection 
up to a multiplicative constant and is minimax over anisotropic Besov balls. Finally 
simulation results illustrate the performance of the estimation and underline parameters 
that impact the estimation accuracy. 
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1. Introduction 

In some survival analysis studies, the observation of a positive variable of interest Y 
called lifetime, is restricted to the knowledge of whether or not Y exceeds a random mea- 
sure time T. We only observe the time T and the "current status" of the system at time T, 
namely 5 = Ms^ykt}- Such data arise naturally for example in infectious disease studies, 
when the time Y of infection is unobserved, and a test is carried out at time T. This 
framework is also called interval censoring (case 1) since the observation (T, 5) indicates 
whether Y lies in [0, T] or (T, +oo). The lifetime Y and the observation time T may 
depend on observed covariates X, and Y and T are usually assumed to be independent 
given X. 

Current status data have been widely studied for the last two decades. Most results 
about nonparametric estimation of the survival function are based on NPMLE (Nonpara- 
metric Maximum Likelihood Estimator). Groeneboom and Wellner [1992] prove that the 
NPMLE is pointwise convergent at rate n~^/^ which is the optimal rate, and van de Geer 
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[1993] establishes a similar result for the L^-risk. This unusual rate of convergence dif- 
fers from the uncensored and right-censored cases, in which the distribution function can 
be estimated with the parametric rate of convergence n^^/^. Besides, as far as the au- 
thor knows, no minimax rate of convergence has been computed on classical regularity 
spaces. More recently, estimators developed from the NPMLE allow to take into account 
the known regularity of the function. Hudgens et al. [2007] build three estimators derived 
from the NPMLE, and compare their performances on simulated and real data, van der 
Vaart and van der Laan [2006] apply smoothing methods to the NPLME to estimate 
the survival function from current status data in presence of high dimensional covariates. 
Birge [1999] proposes an easily computable histogram estimator which reaches the min- 
imax rate of convergence. Nevertheless the procedures proposed in these papers are not 
adaptive on classical regularity spaces. Few results about adaptivity are available, and 
they do not include covariates: Ma and Kosorok [2006] introduce a NPMLE and a least 
square estimator on Sobolev classes, and select the regularity parameter with a penalized 
criterion. Brunei and Comte [2009] consider a least-square estimator on classical bases and 
introduce a model selection procedure with a more easily computable penalty function. 

We consider an i.i.d. sample (Xi, li)j=i,...,ni where the (Xj)'s are i.i.d. random variables 
with common density fx, and the (li)'s are positive variables called survival times. For 
every i, Yi depends on Xi, and we denote by F{x,y) the cumulative distribution function 
(c.d.f.) of Yi given Xi, namely 

F{x,y) = P[Y < y\X = x] 

where P[i?i|S2] denotes the conditional probability of Ei given £'2- We consider an i.i.d. 
sample (Tj)i=i_..,^„ of positive random variables such that for every i £ {1, . . . ,n}, Ti and 
Yi are independent given Xi, and we observe the sample 

(1) {Xi,Ti,5i = '}lY^<Ti)i=l,...,n- 

This paper presents an estimator of the conditional cumulative distribution function F 
from the sample (1). The estimation procedure, inspired from Brunei and Comte [2009], 
is based on the following heuristic. For every {x,u), 

E[5\{X,T) = {x,u)]=E[ilY<u\{X,T) = {x,u)]. 

Given X = x, Y and T are independent, thus 

K[5\{X,T) = {x,u)] = E[nY<u\X = x] = P[Y < u\X = x] = F{x,u). 

Thus F is the regression function of 6 over (X, T) , and the interval censoring issue turns 
into a regression function estimation problem where all the variables involved {X, T, 5) are 
observed. Therefore we can apply methods developed for regression function estimation 
to our issue. 

More precisely we consider a collection of linear subset of L^(M^), and build an estimator 
by minimization of a least-square contrast on each subset. Then a model selection criterion 
provides an estimator which converges as well as the best estimator among the collection, 
up to a multiplicative constant: the estimator is said to be adaptive. We first state 
adaptivity conditionally to the observed {(Xi,Tj)}'s under weak assumptions, for the 
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empirical norm: 

1 " ^ 

\\F-F\\l = -^{F-Ff{X„Ti 



via a nonclassical use of Talagrand Inequality. The empirical risk \\F — F\\f^ naturally 
arises since it indicates the quality of estimation on the set of observations {(Xj,Tj)}. 
Besides considering the empirical norm allows a direct transposition of the result for non 
random observation times (TiYs and covariates (Xj)'s. Then the integrated risk is con- 
trolled under additional assumptions about the collection of models as well as minimum 
regularity conditions for F. Nevertheless, considering the integrated risk enables us to 
conduct a minimax study and prove that our estimator is optimal over anisotropic Besov 
balls Bl^{L). 

The paper is organised as follows. Section 2 introduces the tools involved in the esti- 
mation procedure. The definition of the estimator and the main result are presented in 
Section 3. In Section 4, we study the rate of convergence of the estimator over anisotropic 
Besov balls and prove that it is minimax. A numerical study is conducted on simulated 
data in Section 5. Section 6 is devoted to the proofs. Section 7 gathers the Talagrand 
deviation inequality used in the proofs and a linear algebra technical lemma. 



2. Tools 

2.1. Notations. For every i.i.d. random variables {T^, Wj}, we denote by fv the density 
of Vi, by f(v,w) the density of the couple {Vi,Wi) and by fy\^/{v,w) the conditional den- 
sity of Vi at V given Wi = w for every i = 1, . . . , n. 

The conditional cumulative distribution function F{x, y) is estimated on a compact set 
A = Ai X A2 where Ai is a compact interval of M, and A2 = [0, 02] for some positive 02. 

For every t, s £ L'^iA), let 

1 " 

{s,t)n = - '^s{Xi,Ti)t{Xi,Ti) and \\s\\l = {s, s)n. 
1=1 

The expectations of the formers are denoted by: 

|2 



(•5'*>/(x,T) = / / s{x,u)t{x,u)f(^x,T){x,u)dxdu and \\s\\j = {s,s)f^^,^y 

Let M be a symmetric matrix of dimension dx d with non- negative coefficient, we denote 
by p the spectral radium of M : 



p{M) 



sup 

2^i = l "i 



E 



Mijaittj 



sup 



E 



Mjj |a,;| |a 



=1 
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2.2. Collection of models. We construct a collection of finite-dimensional linear subsets 
of L'^{A) called models as tensor products of models on Ai and A2. For j = 1 or 2, consider 



a collection of linear subsets of {Aj 



M^) = {5^],m, G/^)} where Dim(5^]) = I)^] < +00. 
Then for every m = (mi, 771,2) & In = In"^ x In \ we define 

5^ = 5^05^2)^ it :^^R, t{x,y)= ^ a^^Kt^T iv) 

where {(pT^),. is an orthonormal basis of Sml, {'^7^^),_-, n(2) orthonormal 

basis of Sml ^^'^ 

(2) Jm = ((1, 1), . . . , (1, Dl^l), (2, 1), . . . , (2, D^il),..., p« , 1), . . . , {D^J,\,Dl^l)) 

We consider the following assumption that restricts the number of models in collections 

(H) Let J = 1 or 2. For every 6 > 0, there exists a constant Bj such that 

exp (^-ft/ol^^ < Sj, VnGN*. 

2.3. Regression-type contrast. The contrast is based on the following result which 
generalizes the heuristic presented in Section 1. It is proved in Section 6.1. 

Lemma 2.1. Almost surely, 

E[6it{Xi,Ti)\{Xi,Ti)] = F{Xi,Ti)t{Xi,Ti). 

As already noticed in the introduction, considering t = 1, this amounts to say that F 
is the regression function of 5i over (Xi ,Ti). Thus we consider the classical least square 
contrast for regression function estimation: 



n 

ln{t) = - y^{t{Xi,Ti) - 5i^^ 

77 ^-^ 



i=l 

which measures the accuracy of the approximation of the {5j}'s by the {t(Xj,rj)}'s (see 
e.g. Baraud [2002]). Let us explain more precisely why it is relevant. For every t G L^(A), 

2 ^ 

(3) in{t) = ml + \\6\\i --Y. '^'*(^*' 

77 . 

1=1 

where ||5||n = (1/n) Er=i Thus 

2 " 



77 . 

1=1 
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with C = E[(5i] is independent of t. Therefore, by Lemma 2.1, 



nin{t)] 



+ c 



c 



t\\f -2{F,t)f,^^,+C=\\F-t\\f 

"J(X,T) \ ' IJ{X,T) II ll/(X,T) 

with C independent of t. This shows that minimizing 7n(.) is equivalent to minimizing 



• ll/(X,T) 



and should provide a function close to F. 



2.4. Minimum contrast estimators. For every model Sm £ ■M.n we consider the esti- 
mator 



(4) 



arg min 7n(t). 

t^Sm 



The definition (4) amounts to stating that djn{Fm)/dak\i' = for every (^k\l^^ G Jm- 

Then, denote Fm{x,u) = afc,i0™^(2;)V'™^(^)i the coefficient column vector 

Am = [ak,i]{k,i)GJm satisfies: 



(5) 

where 



m-^m — I'm 



Vrr, 



n 

- ^ ix^)^pr m)<P7 m: 



1=1 



is the Dm X Dm-square Gram matrix related to {0™^V'™^}(fc,/)eJm ^'^^ scalar product 
(., .)„ and 



Vrr. 



n 



i=l 



(fc,OGJm 



is a column vector. 



Comment. As the matrix Gm is not always invertible, equation (5) does not provide a 
unique solution Am- Nevertheless, consider an observed sample (1). Let 5m be the subset 
of M" defined by 

Sm = {{t{Xl,Ti), . . . , t{Xn, Tn)) , t G Sm} 

and Zm = argmin^g^ (1/^) Y17=ii^i ~ ^i)^- is the projection of {5i, . . . , (5„) on Sm 
for the canonical norm on M", so Zm is uniquely defined. Moreover, by definition of Sm, 
there exists at least one function G G Sm such that Zm = {G{Xi,Ti), . . . ,G{Xn,Tn)). 
Then G minimises 7n(i) on Sm- Moreover, if two such functions G exist, they are equal 
on the set {(Xj,Tj)}, so \\Fm — F\\'^ remains the same. For that reason, the definition (4) 



of Fm is sensible for the risk E 



\F - F\\l\m,T,)} 



i=l, 



2.5. Bias- variance decomposition. The minimization of the contrast 7„ over the collec- 
tion of models 7W„ carried out in Section 2.4 provides a collection of estimators {Fm, m G 
/„}. Considering the empirical norm ||.||n) the best model, called the oracle, is the one 
which minimizes 

(6) E 



— F\\'^\{{Xi,Ti)}i=i^,,,^, 
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This model is unknown, but the model selection procedure originally developed by Birge 
and Massart allows to select a model which approaches the oracle. With Pythagoras 
Theorem, for every m £ In the risk (6) splits in two terms called bias and variance: 



(7) E 



IF — Fl 

J- rr). J- 



i=l,...,n\ 



\F-K 



mWn 



+ E 



\F^m -^m lln I { ) ^i)} 



i=l,...,n 



where Fm = argmintg^^ ||F — We will build an estimator of this bias-variance sum 
and minimize it to select a model m (see Birge and Massart [1998] for more details). 

In order to clarify the calculations with conditional expectations, we adopt the following 
notations. Let {(xj, Mi)}j=i^..._„ be in A^, we define the set 



(8) A = {Xi = xi, . . . ,Xn = Xn,Ti = ui, . . 

and for every s,t £ L'^iA) we set 



T 



Un} 



(9) 



^ n 1 

{s,t)o = -y^t{xi,Ui)s{xi,Ui) and \\t\\l = - t"^ { 



Xi,Ui) 



i=l 



i=l 



The norm and scalar product ||.||o and (., .)o are equal to ||.||n and (., .)n on the set A, thus 



(10) 



E 



\F-F„ 



E 



Frr 



7? I|2 



\F - FmWo on A. 



We consider a ||.||o-orthogonal basis of Sm- i^\)x£im such that ||9?a||o = or 1 for every 
A G Im- (Lemma 7.1 states the existence of such a basis.) Note that it is only a tool for 
variance upper bound and it is not involved in the estimation. 



Upper bound on the variance term E 



I F 



7? I|2 



By (10), Fm = argmiuigs^ ||F- 



t||o on the set A^ hence Fm = Ylxeim^^^'' ^)o^x- L^t Fm = X^AeJm ^-^'Z'^- Similarly to Sec- 
tion 2.4, the equality Fm = argmin^gs^ 7n(i) is equivalent to 



- / 1 \ 1 

(11) J] 6J - J^V'A(^i,?:/i)¥'A'(^i,^^i) =-^<5i95A'(^i,^i), va'g/^. 

AeJm \ i=l / i=l 

The family ('^A)Ae/„ is ||.||o-orthogonal thus on the set A, (11) is equivalent to 

1 " 

^A'llV'A'llo = -'^'^Y,<u,VX'iXi,Ui), VA' G Im- 
i=l 

For every A such that Hv^aIIo / 0, 6a = {'^/n)Y.'i=iT^Yi<Ui'Px{xi,Ui). If \\(px\\o = then 
ifx{xi, Ui) = for every i, therefore the arbitrary value of 6a does not affect the expression 
of Fm{xi,Ui) and 
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E 



IP P l|2 



Xeirr 



n 



i=l 



1 " 



Ae/, 



Given A, the (li)'s are independent, and so are the (]I|y.<„.}) 's. Therefore the terms in 
the sum above are equal to for i ^ j. Moreover ]I|y.<„.} is a Bernoulli variable and a 
variance computation leads to 



(12) 



E 



mllo 



■t=i 



Drn 

4n 



Thus the variance term is upper bounded by a term of order Dm/n. Moreover, the mini- 
max study (see Section 4) ensures that Dm/n is the actual order of the variance. 

Estimation of the bias term — F||q. By definition of Fm, 



i=l 



\\Fm-F\\l = mm - ^ {t{xi,Ui) - F{xi,Ui)f = min - ^ {t{xi,Ui) - E[6i\{Xi, Ui) = (xj, 

i=l 

which is naturally estimated on A by 

1 " 

(13) min - {t{xi,Ui) - 5i] 



^n{Fm) • 



i=l 



3. Definition of the estimator and main result 

3.1. Definition of the estimator of F . Consider the collection of model M.n defined in 
Section 2.2, the contrast 7n defined in (3) and the collection of estimators {Fm,m G 
where F^, is defined in (4). Following the model selection procedure presented in Section 
2.5 with the bias and variance estimations (12) and (13), we select the model: 



m = arg mm 



ln{Fm) +pen{m) 
where pen{m) = 9Dm/n for some numerical constant 6 > 1. 



Besides, the target function F lies in [0, 1] by definition. We use this information to 
improve the estimation by constraining the values of our estimator to remain in the same 
interval. More precisely we consider the estimator where 



(14) 

Comments 



if FffXx,u)<0 
Ffh{x,u)= "( 1^ if -^^(a:,^) > 1 
Fffi_{x,u) otherwise. 
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1) The restriction imposed in (14) is not only necessary in order to prove some convergence 
results but also improves the estimation. Indeed, for every (x, u) £ A, 

\Fm{x,u) - F{x,u)\ <\Fm{x,u) - F{x,u)\, \/{x,u) € A, Vm G N* 

almost surely. In particular, \\Fm — FW^ < \\Fm — FW"^. Thus, any upper bound on 
K[\\F^ - F\\l\{{Xi,Ti)}i=i,,..,n] is an upper bound on E[||F^ - F\\l\{{Xi,Ti)}^=i,„„n]. 

2) The condition on 6 could be weakened to > 1/4 under the same assumptions with 
slight technical changes in the proofs, but we assume that 9 > 1 for sake of simplicity. 

3) The convergence results presented in this paper are valid for any 9 > 1/4, but in 
practical implementation a value of 6 has to be fixed. It can be either calibrated on 
simulated data from a large number of examples, or chosen a priori independently of 
the framework (a constant equal to 2 in the penalty is often considered as a reasonable 
value, see for example Massart [2008]). 

4) Note that the constant involved in the penalty is a numerical constant whereas in many 
other frameworks it depends on unknown parameters of the problem and has to be 
estimated. This makes our model selection procedure especially simple to implement. 

3.2. Risk for the empirical norm. The estimator Ffn satisfies the following oracle 
inequality. 

Theorem 3.1. Assume that Assumption (H) holds, there exist numerical constants Ci 
and C2 such that almost surely, 

(15) E\\\Ffn - F\\l\{iX,,T,)}i=iJ <Ci inf | inf \\F - t\\l + penim)] + ^. 
Comments 

1) For every model m G In, {inftg^^ \\F — + pen(m)} has the same order as ||Fm— 
(see Section 2.5). Thus Theorem 3.1 indicates that up to a multiplicative constant, the 
model selection estimator Ffn converges as fast as the best estimator in the collection. 

2) It is clear that the same result holds with non random observation times (Ti, . . . ,T„) 
and non random covariates (Xi, . . . , X„). 



3) According to Comment 2 in Section 3.1, 

K\\\Ffn-F\\l\{{Xi,Ti)}^=i^,„J <Ci inf (inf \\F - t\\i + pen{m) \ + 

4. Convergence of the estimator on anisotropic Besov balls fig 00 

In this section, we prove that our estimator reaches the minimax rate of convergence 
over anisotropic besov balls. As the definition of Besov spaces refers to L^-norms, it 
appears natural to consider the risk of our estimator for the integrated norm |M|/(J(-t)- 
Convergence results are then derivated from Theorem 3.1 under additional assumptions. 
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4.1. Definition of anisotropic Besov spaces. We recall the definition of two-dimensional 
anisotropic Besov spaces stated for example in Hochmutli [2002]. Let C M^, and 
/ e L'^{n). For i = 1 or 2, r G N* and h > 0, let 

r 

^l,U){x,y) = Y,{l){-lY~'f{{x,y) + khe,) 

be the directional partial difference operator for every (x, y) S 0^ j where fi^ j = {{x,y) £ Q, {x, y) + rhej £ £ 
and (ei, 62) is the canonical basis of M^. For t > 0, let a;.r,j(/, = sup|^|<J|A^^(/)(x,y)||/^2(f^r 

be the directional modulus of smoothness for the L^-norm. Let /3 = (/3i,/32) £ (M^)^ and 
rj = [/3jJ + 1. We define the anisotropic Besov space of parameters (/3,2,cxd) as 



< +00 



where 



I/Ib" (n) = 



We consider the following norm on oo(^)- 



and for L > 0, 



Bl^{A,L) = {feBl^{A), 



4.2. Additional assumptions. We consider bound conditions for f(x,T) well as ad- 
ditional assumptions about the collections A4n^ and Ain^. 

(Ai) There exist ho > 0, hi < +00 such that Hq < f(x,T)ix,u) < hi, \/{x,u) G A. 

(A2) For j = 1 and 2, 



(16) 



n 



Vn G N 



for some polynomial P^-'). Moreover, there exists a model Sn^ G A^n^ of dimension A', 
such that, for every mj G si^] C 5^-^^ Besides n!^^^ nP < ^/n/logn. 



U) 

n 



(A3) There exists a positive constant Ki (resp. K2) such that, for every mi G /, 

(resp. 1712 ^ In ), 



(1) 

n 



(1) 



(2) 



sup ^(C'(x))2 < resp. sup j;(C(^))' < ^2!?. 



fc=i 



V 



U&A2 



1=1 



Assumption (A2) refers to the number of models in the collection, whereas (A3) depends 

on the nature of the models. Assumption (A3) holds in particular if the collections A4n^ 
(2) 

and Mn consist of the following models. 



(i) 5^; is the set of piecewise polynomials with maximum degree sj and step lg(^j)/-D; 
where Ig(^j) denotes the length of Aj. 



if) 
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(ii) Sm- = vect{xi,k,f' ^ mj,k € Z} where x is a mother wavelet with regularity Sj, 
X/,fc(x) = 2'/2;^(2'x - A:) and L>g- = 2™^ . 

(iii) Sm] is the set of trigonometric polynomials with maximum degree Dm- . 



4.3. Upper bound of \\Fffi — F\\f^^^y Under the additional assumptions from Section 
4.2, Theorem 3.1 leads to the following result. 

Corollary 4.1. Assume that (H), (Ai), (A2) and (A3) hold then 



(17) E 



I 



< C3 inf <; inf ||F - + pen{m) \ + 



where C3 is a numerical constant and depends on Hq and K. 

Comment Corollary 4.1 indicates that the rate of convergence of Ff^ for the ||-||/(_^ ^j-risk 

is the one of the best estimator among the collection {Fm,m G In} (see Comment after 
Theorem 3.1). 

For the models (i) — {iii) described in Section 4.2, the bias term inf^g^,^ ||F — t\\'l 
in the right-hand member of equation (17) is upper bounded on anisotropic Besov spaces. 

Lemma 4.1. Assume that F G B2^{A, L) for some L>0 and/3 = {f3i,/32) G (M+)^ and 
the collection M-n is set up from linear models (i), (ii) with Sj > f3j — 1, or (Hi). There 
exists a positive constant Cq such that 

inf ||F - t|| < Co ({Dglr^^ + {Dl^lr^A . 

Lemma 4.1 is proved in Lacour [2007] based on papers from Hochmuth [2002] and 
Nikol'skii [1975]. Inserting this result into Corollary 4.1 provides the rate of convergence 
of our estimator on anisotropic Besov spaces. 

Corollary 4.2. Assume that F G B!f^^^\A,L) with pi, P2 > I and M n is set up from 
models (i), (ii) with sj > f3j — 1, or (iii). Moreover assume that (Ai) and (A2) hold, and 

(18) ^"^-(bg^)'^' = 

Then 



I 77^ FlP 

\-'^m ^ II 



E 

for some positive constant C5, where f3 = 2/3i/32/(/3i+/32) "is the harmonic mean of (/3i, /32). 
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Indeed, for every m = (mi, 7712), 
E 



I 77^ Z?ll2 



f{X,T) 



C4 
1 



< Csihi inf \\F - t\r + pen(m) } + 

[ teSm J n 



where Co depends on {13, L). Let mi and m2 be such that 

1 < i:)ii)n"^2/{/3i+/32+2/3i/32) < 2 and 1 < i:)(2)^-/3i/(/3i+/32+2/3i/32) < 2, 

(Assumption (18) guarantes the existence of such models for /3i, /32 > 1), then 



Moreover E 



IF F\ 



< (l//io)IE 



|_p _p||2 

' "/(A',T) 



which proves Corollary 4.2. □ 



Remark 4.1. T/ie condition f3i, /32 > 1 m Corollary 4.2 can 6e generalised to 

(/3i,/32) G (/3i*,+oo) X (/3*,+oo) 

/or a known couple (/5i,/3|) mi/i /3* > 1, where (3* is the harmonic mean of j3\ and /3| hy 
considering Nn^ and Nn'' such that 

< (logn)-i/2n'^i*/(/3r+/32*+2/3i*/32*) and N^^ < {logn)-^V2/^K+K+-^^tK). 

This alternative assumption allows to take into account a priori knowledge on the regularity 
of F through an appropriate choice of {Nn \ N^'^). The estimation would be optimized by 
considering a smaller maximum size of models {Nn^) in the direction where F is more 
regular. 

4.4. Lower bound. Let be a set of conditional cumulative distribution functions on 
A. A sequence (rn)neN of positive numbers is called the minimax rate of convergence for 
F over if there exist two constants c and C such that 



c < inf sup fr-^E[||F„ - Ff]) < C 



where the infimum is taken over all possible estimators Fn- Note that the minimax rate 
is defined up to a multiplicative constant. 

According to Corollary 4.2, provided that /3i, /32 > 1, 

inf sup (n-^/(^+^^E[\\Fn- Ff]] < sup (n-^/'^'^+^^E[\\Fff,-Ff]]<C. 
Moreover, the following result holds. 
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Proposition 4.1. Let /3 = (/3i,/32) G (l,+oo) x (l,+oo). Assume that hi = ||/(x.T)l|oo < 
+00, then there exists a constant c which depends on {(3,L,hi) such that 

inf sup E 

Therefore, for every /32 > 1, the minimax rate of convergence over 132f^{A,L) is 

and Fffi is minimax. This proves that our estimator adapts to the unknown 
regularity /3 of the function F. 

5. Graphical results on simulated data. 

In this section, we present the performance of the estimator F^ on simulated data. In 
particular, we study the impact of the distance between the distributions of Y and T on 
the estimation accuracy. 

5.1. About the numerical implementation. We have chosen to implement the proce- 
dure in an histogram basis. The basis of functions in which the estimator is computed are 
supposed to be fixed independently of the data, and the error of estimation is bounded on 
a set included in the support of the distribution of {X, T) . In practical cases this support 
is usually unknown and has to be estimated from the data. In our implementation, we 
consider histograms supported on the set 

A = [quantile(0.01,X),quantile(0.99,X)] x [quantile(0.01, T), quantile(0.99, T)]. 

We have chosen a constant ^ = 2 in the penalty but the estimation results seem quite 
robust when we change this value. 

5.2. Results for several sample sizes. We consider the following distribution of {X, Y, T). 

X~Z^([0,3]), 

Y = X + e with e ~ Exp(l), 
T = X + £' with e' ~ Exp(l). 

Figure 1 presents the conditional distribution function F{x, y) as well as its estimators 
for several values of n. The same functions are plotted for a fixed x, x = 2, and a fixed 
y, y = 3.3, respectively on first and second rows of Figure 2. As expected the accuracy 
of the estimation increases with the size of the sample, but we notice that a quite large 
size of sample is required to get a correct estimation. This is not surprising given the 
nature of the current status framework in which the observed data give a very incomplete 
information about the variable of interest. Besides, we note that the estimation of the 
dependence of F on y when x is fixed is substantially better than for the dependence on 
X. The same phenomenon is observed with other distributions of the variables. 

5.3. Results for several distributions of the observation time. In a right-censoring 
framework, the rate of censoring (defined as the expected proportion of observations that 
are censored) is a parameter that impacts the accuracy of the estimation: the lower the 
rate of censoring, the better the estimation. Indeed as the rate of censoring decreases, the 
proportion of survival times actually observed increases and the estimation of the survival 
time distribution gets better. 



n 



n 



> c. 
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True function F n = 500 n = 1000 n = 10000 

Figure 1. Conditional distribution function F{x,y) of Y given X and 
estimator Efn for sizes of sample n = (500, 1000, 10000). 




n = 500 n = 1000 n = 10000 

Figure 2. Conditional distribution function F{x, y) oiY given X for x = 2 
(first row) and conditional distribution function F{x^ y) of Y given X for 
y = 3.3 (second row), for sizes of sample n = (500, 1000, 10000). 

In the interval censoring framework, the rate of censoring does not make sense. Never- 
theless, as confirmed by simulations, the estimation accuracy is expected to increase while 
the distance between the distributions of (X, T) and {X, Y) decreases. Indeed consider a 
fixed X = X, the function y — )• F{x^ y) varies more on a set where f[x,Y) is high and less on 
a set where f(x,Y) small. Thus the estimation of F improves if the observations {7i}'s 
are concentrated on a set where f[x,Y) is high. On the opposite, if the main supports of 
f{x,T) ^'iid f[x,Y) disconnected, the observations {Ti}'s provide no information about 
the distribution of Y and the estimation will be impossible. 
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a = a = 0.5 a = 1 a = 2 

Figure 3. Conditional distribution function F(x,y) of Y given X and 
estimator Ff^ for n = 3000, and for an offset a G (0, 0.5, 1, 2) of the time of 
observation T. 

The link between the error of estimation and the distance between f(x,T) and f(x,Y) 
implicit in the theoretical results since the risk upper bounded in Corollary 4.2 is weighted 
by f{x,T)- 

In Section 5.3 we have considered a measurement time T with same conditional distri- 
bution as Y. Now we add an offset a = 0.5, 1,2, 3 to the distribution of T. 

X ~^/([0,3]), 

Y = X + e with e ~ Exp(l), 
T = a + X + e' with e' ~ Exp(l). 

Then the Li-distance between f(x,T) ^-nd f(x,Y) is equal to 2(1 — exp(— a)) and increases 
as a increases. 

The true function F is the same as Figure 1. Figure 3 presents Fm for several values 
of a and for n = 3000. The same plots are presented in Figure 4 for a fixed x = 2 (first 
row) and for a fixed y = 3.3 (second row). The product of intervals used to compute the 
estimators are the same as described in Section 5.1 but the plots are represented on a set 
which contains the main support of {X, Y): 

I = [quantile(0.01,X),quantile(0.99,X)] x [0, quantile(0.99, F)]. 
With the increase of a (corresponding to an increase of the distance between f(x,T) a-iid 
f{x,Y)) the estimation deteriorates. In particular for a = 2, ||/(x,r) ~ /(x.y)!!^, ~ 
close to 2 which indicates that the distributions of {X, T) and {X, Y) hardly overlap, and 
the estimation is very bad despite a large sample size. 

6. Proofs 

6.1. Proof of Lemma 2.1. Let (x,n) G be such that f[x,T){x,u) > 0. 

E[S^\{X,,T^) = {x,u)]=E[^L{y^^^y\{X^,T^) = ix,u)] = [ ^^"^'^^f^^' "'^f^"'^ ^j/- 

Ja2 fiX,T)[X,U) 

Y and T are independent given X , hence 
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Q = a = 0.5 a = 1 a = 2 

Figure 4. Conditional distribution function F(x,y) of Y given X (red 
line) and estimator (black dotted line) for n = 3000, and for an offset 
a S (0,0.5, 1,2) of the time of observation T. In the first column, x = 2 
and in the second column y = 3.3. 



E[di\{Xi,Ti) = {x,u)] = [ nfa<M} '^^'^^f' )f{x,T){ , I ]i^^^yfY^^{y,x)dy = F{x,u). 

Ja2 fiX,T){x,u) J A, 



□ 



6.2. Proof of Theorem 3.1. Let m = (mi,m2) G In and F^ G Sm- By definition of m 
and Fm, 

(19) 7n{Frh) +pen{m) < -fn{Fm) +pen{m) < 7„(Fm) +pen{m). 

Besides, for every s,t G Sn, 
(20) 

1 " 

7n(t)-7n(s) = -Y,\iiiXi,Ti) - 6i)^ - {s{Xi,Ti) - 6^f] = \\t-F\\l-\\s-F\\l-2un{t-s) 
"2=1 

where Un{t) = (1/n) Y.1=i i^i " FiXi,Ti)) t{Xi,Ti). Thus (19) implies 

\\Ffn - F\\1 < \\Fm - F\\l + pen{m) - pen{m) + 2vn{Ffh - Fm) 
< \\Frn - F\\l + pen{rn) - pen{m) + 2\\Fffi - Frn\\n sup i^n(i)- 

This last inequality is the main distinction with the integrated risk framework (see e.g. 
Massart [2007]). In this context terms such as Vniffh — fm) are upper bounded by 
ll/m fm Wg^^Vt&Sn^+Sff,, \\t\\g<i''^n{'t) where ||.||c, is the L -norm associated to a suitable 
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function g. Technically, this change requires a non i.i.d. version of Talagrand inequality 
(Theorem 7.1) instead of the more classical i.i.d. version (see e.g. Lacour [2008], Section 
6, Lemma 5). As a consequence weaker assumptions are required and smaller constants 
are obtained in the upper bounds. 



For every function p(m, ml) of m and m\ 
\Ffh-F\\n < ll-^m - i^ll^ +pen(m) -pen(m) + ^||Fa - 11^ + 4 sup {'^n{t)f 

= \\Fm - F\\l_+pen{m) - pen{m) + Ap{m,m) + ^\\Fff, - FmWl, 
+4 sup (^{unit))"^ - p{m,fh] 



Now, consider p{m,m') = (l/4)(pe?i(m) + pen{m')) 
then 

\\Fff,-F\\l < \\Fm-F\\l + 2pen{m) + ^(2\\Fff,-F\\l + 2\\Fm-F\\l 

+4 sup ({i'n{t))'^-p{m,m) 

teSm+Sf^, \\t\\n<i ^ 

and 

hFU^inl<hFm-F\\l + 2pen{m)+A y^i sup \{un{t)f - p{m,m' 

The following result is derived from Talagrand Inequality (Theorem 7.1). 

Lemma 6.1. There exist numerical constants Cq and kq which only depend on the constant 
9 in the penalty such that, for every m, m' G In, (xi, . . . , Xn) £ and {ui, . . . , n„,) G A2, 



E 



sup - y'(#{y^<„j - F(xi,Ui))t(xi,'Ui) -p{m,m' 



< exp(-Ko\/-Dm + Dm') 

n 

where A is defined in (8). 

Therefore, after plugging the result of Lemma 6.1 in (21), Assumption (H) leads to 



E 



Fff, - F\\l \{{Xi, ri)},=i,...,„ < 2\\F - FmWl + Apen{m) + 



n 



for some numerical constant B, which concludes the proof of Theorem 3.1. □ 



Proof of Lemma 6.1 
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The proof relies on Talagrand Inequality (Theorem 7.1). Let (xi, . . . , Xn) G A^, (m, . . . , ii„) G 
A^i and m, m' E Let 



fir 



1 " 

(*) = - '^i'^{Y,<ua - F{Xi, Ui))t{Xi, Ui). 



i=l 



Then 



Z = sup 



where ||.||o is defined in (9) and J-m,m' is the following set of functions from M to M": 



rr 



{/ = (/(I), . . . , /(")), = t{x,, u,){x - F{x„ Ui)), 



t G 5m + Sm' and ||t||o < 1 



}• 



or 



Let (¥'a)\=i n , be a ||.||o-orthogonal basis of 5m + 5m' such that ||9?a||o = 
1, where Dm+m' denotes the dimension of 5m + 5m' (see Lemma 7.1). Let F = {A € 
{1, . . . ,L>mw}> IIv^aIIo / 0} then for every t = Y.\=i'"' ^aV^a e 5'm+5m', ||t||o = EAer^A- 
In order to apply Theorem 1.1 to Z we have to compute 6, v and H. First we compute 
the term H. 



sup ifJ-nit))' 

teSm+s^,, \\t\\o<i 



A 



E 



'D, 



m-\-m' 



E «i<i \ A=l 



Besides, for every A ^ F, fini^x) = 0. Therefore 
E[Z^\A] = E 



A 


< E 







AGP 



sup yiaA/i„((/7A) 
>!<i Vasf / 

/l " 
V 1=1 



sup 
E 

Aer 



4<i \Aer / \Aer / 



fx{xi,Ui) 



A 



In the same way as the upper bound of the variance term in Section 2.5, we obtain: 



^[2V]<j^E(iE»'5(...«.)) = 3^Ei 

AgF \ i=l / AeF 



|F| ^ D^ + D 

m' 

An ~ 



4n 



A 



Now we compute the terms h and v. I[|,<u-} and F{xi,Ui) are in [0,1], so ||1[|,<„.}. — 
F{xi,Ui)\\oo < 1 a.s.. Moreover, let t G 5m + 5m' be such that ||t||o < 1, for every 
ie {l,...,n} 

n 

t^{xi,Ui) < '^t^{xi,ui) = n\\t\\l < n. 
1=1 
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Thus 



sup sup F{xi,Ui))t{xi,Ui)\\^\ <^/n = b. 

teS™+5^/,||i||o<l \i=l,-,n I 



Moreover 



sup 



= sup 

1 

^ 4="- 
Finally recalling that p{m, m!) 
Lemma 6.1. □ 



P ( - ^Var ((]I|y^<„j - F{xi,Ui))t{xi,Ui)\Xi = Xi,Ti = m) ) 

'>l|tllo<i i=i J 

p I - y^^E [{'^{Y,<u,} - F{xi,Ui)f\Xi = Xi,Ti = Ui] t'^{xi,Ui) ) 

'>l|t||o<l i=l J 



with 9 > 1, Theorem 7.1 concludes the proof of 



6.3. Proof of Corollary 4.1. The proof of is divided in two propositions. Let 



'f(X,T) 



Proposition 6.1. Under the assumptions of Corollary 4.1, 



E 



< C[ inf <! inf \\F - +pen{m) \ + ^'^ 



n 



where C[ and C2 are numerical constants. 

Proposition 6.2. Under the assumptions of Corollary 4.1, 



E 



where Cq depends on ho and K. 



< 



n 



6.3.1. Proof of Proposition 6.1. First of all, Gm is the Gram matrix related to the or- 
thonormal basis {</'J^^V'["^}(fc,«)eJm scalar product (., Lemma 3.1 in Baraud 

[2000] indicates that p{G^) = snvt<.sM\l/\\t\\)^^^^^)- Then on 0, p{G^) > 1/2 > 



hence Gm is invertible for every m. 



Moreover, let Fn = argminfg5^ — t|R be the projection of F on the global model 



Sn- Then {Ff^ — Fn) £ Sn hence 
II 77^ Pll-^ 

Thus by definition of 



E 



IF F\ 



f{X,T) 



Tin 



< 2E 

< 4E 

< 4E 



-^m -^n 1 1 n 



+ \\Fn-F\ 



f(X,T) 



\Fm - F\\l^^\ + 4E - FWl^n] + - F\\\^^^^ 



\F — Fir 

Km ^ lln 



+ 5||F„-F||2. 

II ll/(X,T) 
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On the one hand E 



\F F\ 



is upper bounded with Comment 3 after Theorem 3.1. 



On the other hand, for every m £ Jn, Sm C Sn so 



IK - F\ 



f(X,T) 



inf 



f(X,T) 



< inf \\F-t\ 

t^Sm 



Besides, according to Comment 1 in Section 3.1, E 
which ends the proof of Proposition 6.1. □ 



iFfn - Fpf 



f(X,T) ' 
< E 



6.3.2. Proof of Proposition 6.2. The proof is based on the foUowing Lemma. 
Lemma 6.2. Under the assumptions of Theorem 3.1, 



(22) 



P[n^] < 2{Nnfexp 



3 - 2^/2 nho 
2 (iV„)2K2 



Assume that Lemma 6.2 holds. On the one hand for every (x,u) G A, F^(a;,u) and 
F{x,u) lie in [0,1] hence \\Ffn — -^ll/^-^yj ^ 1- other hand let cq = /io(3 — 



E 



< 2(iV„)2exp -CO 



(A^n)^ 



< 2nexp(— Co log^ n) < — □ 

n 



Proof of Lemma 6.2. Let {xx,^ S Jn} be an ^^-orthonormal basis of the global 

space Sn where J„ is the set of index defined in (2) for Dml = Nn^^ and Dml = N:j^\ 
Denote 

1 " 

Sx,y = - Y,ixxiXi,T,)xx'iXi,Ti) - E[xx{Xi,Ti)xx'{Xi,Ti)]) . 



1=1 



By definition of $7, 



p[n^] = p 



sup 



(A,A')eJ2 



1 






>2 


sup 






E ai=i 

ASJn 



up j |0A||aA'||5'A,A'| J > ^ 



Let C and y be the following Nn x A^„-square matrix: 

V = (V'^A,A')(A,A')eJ2 ^^^^^ = ^ [XA(^i>^i)XA'(-'^i'^i! 



C = (cA,A')(A,A')eJ2 where ca,a' = \\xx{Xi,Ti)xy{Xi,Ti 



lloo 



and let 
(23) 



3 - 2V2 

X = mm 



1 



1 



2 --\p\vy p(c), 

where p is the spectral radius defined in Section 2.1. Then y/2xp{y) + xp{C) < 1/2 and 
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Pin"] < p 



J^Aej„"A-l \(A,A')eJ2 
Besides, let {ax)\^j^ be such that X^AeJn '^A ~ ^ 



^"Pj I \ax\\axi\\Sx^y\\ > \/2xp{V) + xp{C) 



^p{V)+xp{C)= sup J^|6a||6a'|(\/2^^ ,A'+a;cA,A') > Qa 1 1 QA' I ( \/ 2^^A, A' +^Ca,A' ) 

aS„^'"^ \(^'^')eJ2 / (A,A')eJ2 

Hence, 

P[J]T < P [3 (A, A') G J2, I 5a, A' I > (x/2^^ + xcA,A')] 

< P [I^aI > \/2^^ + XCA,A'] . 

(A,A')eJ2 

Finally we use Bernstein Deviation Inequality presented in Birge and Massart [1998], 
Lemma 8. Then 

(24) P[n'] < 2exp(-nx) = 2(iV„)2exp(-nx). 

(A,A')eJ2 

Besides, max p(C)) is upper bounded similarly to Baraud et al. [2001]. According 

to equations (2.9) and (2.10) in Baraud et al. [2001], under Assumptions (A2) and (A3), 

sup 11^^^°° < sup = -L sup V {4>l{x)il}'^{y)f < ^l^'^ dim{Sn). 

tGSn 11*11 /(x,T) "-oteSn 11*11 "0 (^.?/)eAfc^^^ "-0 

Then, with Lemma 2 in Baraud et al. [2001], max {p'^ {V) , p{C)) < (i^iK2//io)M^^M^^ 
which concludes the proof of Lemma 6.2. □ 

6.4. Proof of Proposition 4.1. The proof is based on the following theorem (see Tsy- 
bakov [2004], Chapter 2, Theorem 2.5). Let B = bI^{A,L). Denote by K{P,Q) the 
Kullback distance between the distributions P and Q: 



K(PQ) = i I^''&(dP/dQ)dP if P«Q 
^ 1 +00 otherwise 



Theorem 6.1. Assume that there exist M >2 and Fq, . . . ,Fm such that 

(1 ) Fj gB for every j G {0, . . . , M}. 

(2) \\Fj - FiW^ > 2r for every j / / € {0, . . . , }. 

(3) Pj^^ << Pq"''^ for every j G {0,...,M}, where Pj^^ denotes the distribution of 
iXi,Ti,6i)i=i^,„^n if F = Fj, and for some < a < 1/8 



M 

^5:E:(P,("\p("))<alogM. 
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Then there exists a constant c such that 



infsupE r||-F„ — F|p > c. 



We construct a set of distribution functions {Fq, . . . , Fm} which satisfies conditions 1, 
2 and 3. Up to rescaUngs and translations, we assume that A = [0, 1] x [0, 1]. 

6.4.1. Construction of the {Fi) 's. Let 

Fo{x,u) = n[o,i](a;) (a]I[o,+oo[(^) + au^lo,i]{u) + (1 - a)n(i,+oo[('")) 
with a = min(l/3, L/2). For every x G [0, 1], 

• Fo{x,u) = 0, V'u<0, 

• Fo{x,u) = l, V-u > 1, 

• Fo{x, .) is increasing on [0, 1] and Fo{x, u) G [a, 2a] C (0, 1) for every u G [0, 1], 

thus Fq is a conditional distribution. Let ^ be a one-dimensional wavelet supported on 
[0, 1]. Let J = (ji, J2) be a couple of non-negative integers determined further. For every 
S= (si,S2) G let 



y'j^s{x,u) = 2(^i+^2)/2^^2^ix - 51)^^(2^^^ - S2) 



There exists a subset Rj of such that 



• Supp{'ipj^s) = Ij,s C]0, Ip for every S G Rj, 

• The applications {ipj,s^ S G have disjoint supports , 

• \Rj\ = 2^'+^\ 



Let 6 be a positive constant which will be determined later. For every £ G {0, let 




and Fe = Fq + Gs- For every x G [0, 1] 



• Fe{x, u) = Fo{x, u) = 0, Vn < 

• F^(x,u) = Fq(x,u) = 1 Vtt > 1 



Moreover let (x, u) G [0, 1]^, 



(25) 




Assume that 



(26) 




Let y G [0, u] and be such that {x, y) G Ij^Sa 
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Therefore the term in the integral in (25) is positive and the apphcation Fs{x, .) is increas- 
ing on [0, 1]. Moreover, as ^j^si^i 1) = for every S G Rj, F^{x, 1) = ^0(3^) 1) = 2a < 1. 
Thus is a conditional distribution function on [0, 1]^. 

6.4.2. Condition which guarantees that Ff, ^ B for every e. On the one hand, assume that 
■0 is regular enough, then according to Hochmuth [2002] (Theorem 3.5), 

|0.l<„„„aP,SP'""+2"^»)||0.||^ 

Moreover, as the {^'7,5)5' S -Rj} have disjoint supports, 

2 



ll^^ll' = r E ^^^^^ 

By definition of the wavelets, HV'J.S'II = 



Thus 



n 



\\Ge\\<\ 



E4lh 

seRj 
1, hence 

^20i+i2)/2. 



ll^^ll<^([o,i]2) - l'^^l<^([o,i]2) 



V n 



On the other hand, l-Folg/S ([0 i]2) ~ ^" I'^deed, let = [/3jJ + 1 for i = 1 and 2. Then 
'^i ^ 1) ^2 ^ 2 and 



sup 



t-^ia;,,,i(Fo, t, [0, 1]2)2 + t-'^2^,,,2(Fo, [0, 1]')2 • 

t>0 L J 

Besides let /i > and 

= {(^'^) e [0,l]2,(a; + ri/i,n) G [0, l]^}. 
For every (x, u) G -^0(2; + h,u) = Fq{x, u). So, as ri > 1, A^^^Fo(rc, n) = 0. Hence 
Ur„iiFo,t, [0, 1]2)2 = sup \\Al\Fo\\^,^^r = 0. 

Moreover on [0, 1]^, Fq(x,u) = a(l + u) if n < 1 and ^0(2;, 1) = 1- Thus, let ^0(3^1^^) = 
a(l + u) for every (x, u) G [0, 1]^, Fq and Fq are equal on [0, 1]^ except on a set of measure 
0, so \\Al\Fo\\ = \\Al\Fo\\. Besides, for all {x,u) G [0,l]^ 



^/i,2-' UN - 11^/1,2-' 

^h,2^o(a;,^^) = ah 



AM(x,n) = A;;^-'Al2i^o(^,n) = 



as r2 — 1 > 1. Then ujr2,2{Fo, t, [0, 1]^)2 = and consequently |Fo| 



0. Moreover 



and 



lli^.ll 



< ^2^h+h)/2^2^iPi +2^'2^2 + 1). 
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By definition a < L/2 so ll-Fellg/3 -|^j2) < -^^ as soon as 



(27) 



y|20i+^2)/2(2^i/3i + 2^^P^ + 1) < L - 



6.4.3. Expression of {{F^ — -Fe'p. 



b f 1} I) 

(28) \\F, - F,,f = -Y. - ^'s?^ls{^,u)dxdu = -Yl ^{es^e',} = -P{e,e'). 



( n) (n) 

6.4.4. Upper bound of K{P^ , Pq ). For every i G {l,...,n}, under F^, {Xi,Ti,di) has 
density 

Pe{x,U,d) = [{F,{x,u)f{l-F,{x,u))^-''] f^x,T){x,u) 

with respect to C^C® /j, where C is the Lebesgue measure and ji is the counting measure 
on N. Similarly, under Fq, {Xi,Ti,5i) has density 

Po{x,u,d)= {Fo{x,u)f{l-Fo{x,u))^-'^ f{x,T){x,u) 

with respect to C®C®ii. For every e G {0, l}!-'^-'!, is absolutely continuous with respect 
to Pq. Indeed, 

Fo(a;,u)=0 ^ (a:, u) ^ [0, 1] x [0, +cx)[ ^ Fe{x,u)=Q, 
Fq{x,u) = 1 =^ (x,^) e [0, 1] X [l,+oo[ =^ F^{x,u) = l. 

Then 



KiP,,Po) = [ [log (^P^) F,ix,u)+log( ] If'''] ) (1 - F,{x, 
Jm2 L \Fo[x,u)J \l-Fo{x,u)J 

Out of the intervals {Ij^s,S G Rj}, F^ and Fq are equal. Hence 

^5 



u)) 



f{x,T){x,u)dxdu 



S€Rj -^^J-S 
+ log I 1 



Os 



where 9s = es\/hjn ijjj^sixju). For every S G Rj and (x,'^) G Ij^s 



f{x,T){x,u)dxdu 



Os 



FJx, u) 



1 > -1 and 



0s 



l-FJx,u) 



a{l + u) Fo{x, u) 1 a{l + u) 1 — Fo{x, u) 

Noting that log(l + v) < v for every > — 1 we obtain 



1 > -1 



K{Pe,Po)< Yl [ 

S€Rj -^^J'S 



Os + 



92 
^5 



a(l + u) 



es + 



92 



1 - a{l + u)_ 



f{x,T){x,u)dxdu. 



For every u G [0, 1], 



11, 1 1 

< -r. ^ < - and < r < — . 

0(1 + ^) a 1 — a(l + nj 1 — 2a 
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Thus 



K{P„Po) < (l + YZ^) E / 0sfix,T)ix,u)dxdu 

- + iha) l^\\hx,T)\URj\ = a'h\\f^x,T)U 



where a' = l/a + l/(l — 2a). Finally, 

6.4.5. Conclusion. According to Lemma 2.7, Chapter 2 in Tsybakov [2004], there exists a 
family {e^^\ . . . , e^^)) C {0, Ijl^'^l with e(o) = (0, . . . , 0) such that 

p{e^'\e^' )) > ^ = Vi / E {0, . . . , M} 

o o 

and log(M) > (log 2/8)2-'^+-'2 where the distance p is defined in (28). 

Now parameters Bq, b, ji and j2 are choosen so that the family {F^(o) , • • . , F^(m) ) satisfies 
the assumptions of Theorem 6.1 with 

Let 

l/(l+/3i+/32) 



log2 
72 1 1 /(x,r) I loo a'' 



L I 7 



iVb\ V 12 



and Bo = 32 /bc^ 



Let ji and j2 be in N* such that 

(co/2)n''2/{/3i+/32+2/3i/32) < 2ii < con^2/(^i+''2+2'3i'^2) 

^Co/2)n^i/('^i+^2+2/3i/32) < 2i2 < con^i/(/3i+/32+2/3i/32). 

The existence of ji and j2 is guaranteed for n larger than an integer no depending on 
(co, /?). Then for every £ {0, . . . , M} 

\\Fn,-Funf > > :^n('^i+^2)/(^i+^2+2/3i/32) 

" - n 8 - 32n 

= Son" /(/3i+/32 +2/31/32) ^ 5on"^/(^+^) 

hence condition (2) in Theorem 6.1 is satisfied. 
Moreover 

M 



hence condition (3) in Theorem 6.1 is satisfied with a = 1/9. 



Finally condition (1) in Theorem 6.1 is satisfied as soon as (26) and (27) hold. Besides, 
/?! > and /32 > 1 and by (29) ji and j2 are increasing with n. Therefore 2^^^~^^^'>/'^{2^^l^^ + 
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2^2/32 _|_ increases faster than and for n larger than an integer n\ depending 

on -0 and L, (26) holds as soon as (27) holds. Moreover (27) holds as soon as 



which is ensured if 

(30) V6co(cg^+cg^)<^ll-A/-l and 



^/^co(c^^+c^<§(^l-/5) 



(31) ^c^^-/3i/32/(/3i+/32+2/3ife) < - j . 

On the one hand (30) holds as soon as 

which is guaranteed by definition of cq. On the other hand there exists an integer n2 
depending on (/3,co) such that (31) is satisfied for every n > n2- 

Thus for every n > max(no, ni, 722), conditions 1, 2 and 3 in Theorem 6.1 hold with 
r = Bon~^^^^^^\ which concludes the proof of Proposition 4.1. □ 

7. Appendix 

7.1. Talagrand Inequality. We use the following version of Talagrand Inequality. 

Theorem 7.1. Let (Vi, . . . ,Vn) be independent random variables, and J- be a countable 
set of applications from M to M" such that —T = T . Let 



Z = sup 



1 " 

-Y.{f^'\Vi)-E[f(^\Vi)]) 



n 

i=l 



and b, v and M be such that 



sup ( sup ||/(')||oo I < sup -S" Var{f^'\Vi)) <v and EZ < M. 
fe^ \^=l,...,n J /6-^^~^ 

Then, for every 9 > 1, there exists C, C' , K , k' such that for every n, 

^[{Z'-eU-)^] < C^expf-^^") +C7'4expf-7.'^') 
'- ^ n \ V J \ b J 

Theorem 7.1 is derived from Theorem 1.1 in Klein and Rio [2005] by setting s^*^(t) 
i(/«(f) - E[/(*)(yi)]). Similarly to Birge and Massart [1998], Corollary 2 we get 



or^^n I \w I 1 ^ I n . f 2min(l,i/)y 
F[Z > (1 + vjM. + xj < exp I mm 



3 V 2v ' 7b 
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Then we use the formula E[X+] < P[X > s\ds to get 

/>+oo /■+00 

E [(Z^ - 6M^)+] < P[Z> VOM^ + s]ds > P[Z > aiM + ^/a2B[ + assjds 

Jo Jo 

for some positive (ai, 02, 03), and a simple integration provides the result of Theorem 7.1. 
7.2. Linear algebra lemma. 

Lemma 7.1. Let V = yect(^i, . . . , ^d) be a linear subspace of a vector space E. Let 
{s,t)o be a scalar product on E, and ||t||o = \/ {t, t)o the corresponding semi-norm. There 
exists a basis {ipi, . . . , (^o) of V which is orthogonal for the \\.\\o-norm. 

Proof of Lemma 7.1. The proof follows exactly the Gram Schmidt orthogonalisation 
procedure, but with a possibly linearly dependent family. 

• Let ifi = ^1. 

• For every k € {1, . . . , D — 1} , we set ipk+i = ^k+i + Z]j=i '^j^J where 

r oif i|(^,iio = o 

^ \ -{^k+i,Vj)o/\\^j\\o otherwise. 

Thus, for every k £ {1, . . . , D}, Vect{^i, . . . ^fc} = Vect{ifi, . . . , ipk) and the {(pj)^s are 
orthogonal for the ||.||o semi- norm. □ 
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